NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Robust estimation in regression and classification methods for large dimensional data

https://doi.org/10.1007/s10994-023-06349-2

Zhang, Chunming; Zhu, Lixing; Shen, Yanbo (September 2023, Machine Learning)

Full Text Available
Assessment of Projection Pursuit Index for Classifying High Dimension Low Sample Size Data in R

https://doi.org/10.6339/23-JDS1096

Wu, Zhaoxing; Zhang, Chunming (March 2023, Journal of Data Science)

Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($$\mathrm{PDA}$$) index, built upon the Linear Discriminant Analysis ($$\mathrm{LDA}$$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($$\mathrm{SVM}$$). This paper conducts extensive numerical studies to compare the performance of the $$\mathrm{PDA}$$ index with the $$\mathrm{LDA}$$ index and $$\mathrm{SVM}$$, demonstrating that the $$\mathrm{PDA}$$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $$\mathrm{PDA}$$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $$\mathrm{PDA}$$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools.
more » « less
Full Text Available
Empirical likelihood inference in autoregressive models with time-varying variances

https://doi.org/10.1080/24754269.2021.1913977

Han, Yu; Zhang, Chunming (May 2022, Statistical Theory and Related Fields)

Full Text Available
Covariance function versus covariance matrix estimation in efficient semi-parametric regression for longitudinal data analysis

https://doi.org/10.1016/j.jmva.2021.104900

Jia, Shengji; Zhang, Chunming; Lu, Haoran (January 2022, Journal of Multivariate Analysis)

Full Text Available
Further Examples Related to Correlations Between Variables and Ranks

https://doi.org/10.1080/00031305.2020.1831956

Zhang, Chunming (April 2021, The American Statistician)
null (Ed.)
Full Text Available
On simultaneous calibration of two-sample t-tests for high-dimension low-sample-size data

https://doi.org/10.5705/ss.202018.0467

Zhang, Chunming; Jia, Shengji; Wu, Yongfeng (January 2021, Statistica Sinica)
null (Ed.)
Full Text Available
A Computational Perspective on Projection Pursuit in High Dimensions: Feasible or Infeasible Feature Extraction

https://doi.org/10.1111/insr.12517

Zhang, Chunming; Ye, Jimin; Wang, Xiaomei (August 2022, International Statistical Review)

Summary Finding a suitable representation of multivariate data is fundamental in many scientific disciplines. Projection pursuit ( ) aims to extract interesting ‘non‐Gaussian’ features from multivariate data, and tends to be computationally intensive even when applied to data of low dimension. In high‐dimensional settings, a recent work (Bickel et al., 2018) on addresses asymptotic characterization and conjectures of the feasible projections as the dimension grows with sample size. To gain practical utility of and learn theoretical insights into in an integral way, data analytic tools needed to evaluate the behaviour of in high dimensions become increasingly desirable but are less explored in the literature. This paper focuses on developing computationally fast and effective approaches central to finite sample studies for (i) visualizing the feasibility of in extracting features from high‐dimensional data, as compared with alternative methods like and , and (ii) assessing the plausibility of in cases where asymptotic studies are lacking or unavailable, with the goal of better understanding the practicality, limitation and challenge of in the analysis of large data sets.
more » « less
Maximum Independent Component Analysis with Application to EEG Data

https://doi.org/10.1214/19-STS763

Guo, Ruosi; Zhang, Chunming; Zhang, Zhengjun (February 2020, Statistical Science)
null (Ed.)
Full Text Available

Search for: All records